Serveur d'exploration sur la recherche en informatique en Lorraine

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Scan-to-XML : Using Software Component Algebra for Intelligent Document Generation

Identifieur interne : 008307 ( Main/Exploration ); précédent : 008306; suivant : 008308

Scan-to-XML : Using Software Component Algebra for Intelligent Document Generation

Auteurs : Bart Lamiroy ; Laurent Najman

Source :

RBID : CRIN:lamiroy02a

English descriptors

Abstract

The main objective of this paper is to experiment a new approach to develop a high level document analysis platform by composing existing components from a comprehensive library of state-of-the art algorithms. Starting from the observation that document analysis is conducted as a layered pipeline taking syntax as an input, and producing semantics as an output on each layer, we introduce the concept of a Component Algebra as an approach to integrate different existing document analysis algorithms in a coherent and self-containing manner. Based on XML for data representation and exchange on the one side, and on combined scripting and compiled libraries on the other side, our claim is that this approach can eventually lead to a universal representation for real world document analysis algorithms. The test-case of this methodology consists in the realization of a fully automated method for generating a browsable, hyper-linked document from a simple scanned image. Our example is based on cutaway diagrams. Cutaway diagrams present the advantage of containing simple ``browsing semantics'', in the sense that they consist of a clearly identifiable legend containing index references, plus a drawing containing one or more occurrences of the same indices.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" wicri:score="622">Scan-to-XML : Using Software Component Algebra for Intelligent Document Generation</title>
</titleStmt>
<publicationStmt>
<idno type="RBID">CRIN:lamiroy02a</idno>
<date when="2002" year="2002">2002</date>
<idno type="wicri:Area/Crin/Corpus">003508</idno>
<idno type="wicri:Area/Crin/Curation">003508</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Curation">003508</idno>
<idno type="wicri:Area/Crin/Checkpoint">000F95</idno>
<idno type="wicri:explorRef" wicri:stream="Crin" wicri:step="Checkpoint">000F95</idno>
<idno type="wicri:Area/Main/Merge">008763</idno>
<idno type="wicri:Area/Main/Curation">008307</idno>
<idno type="wicri:Area/Main/Exploration">008307</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Scan-to-XML : Using Software Component Algebra for Intelligent Document Generation</title>
<author>
<name sortKey="Lamiroy, Bart" sort="Lamiroy, Bart" uniqKey="Lamiroy B" first="Bart" last="Lamiroy">Bart Lamiroy</name>
</author>
<author>
<name sortKey="Najman, Laurent" sort="Najman, Laurent" uniqKey="Najman L" first="Laurent" last="Najman">Laurent Najman</name>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>browsable documents</term>
<term>component algebra</term>
<term>document analysis</term>
<term>scripting</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en" wicri:score="5410">The main objective of this paper is to experiment a new approach to develop a high level document analysis platform by composing existing components from a comprehensive library of state-of-the art algorithms. Starting from the observation that document analysis is conducted as a layered pipeline taking syntax as an input, and producing semantics as an output on each layer, we introduce the concept of a Component Algebra as an approach to integrate different existing document analysis algorithms in a coherent and self-containing manner. Based on XML for data representation and exchange on the one side, and on combined scripting and compiled libraries on the other side, our claim is that this approach can eventually lead to a universal representation for real world document analysis algorithms. The test-case of this methodology consists in the realization of a fully automated method for generating a browsable, hyper-linked document from a simple scanned image. Our example is based on cutaway diagrams. Cutaway diagrams present the advantage of containing simple ``browsing semantics'', in the sense that they consist of a clearly identifiable legend containing index references, plus a drawing containing one or more occurrences of the same indices.</div>
</front>
</TEI>
<affiliations>
<list></list>
<tree>
<noCountry>
<name sortKey="Lamiroy, Bart" sort="Lamiroy, Bart" uniqKey="Lamiroy B" first="Bart" last="Lamiroy">Bart Lamiroy</name>
<name sortKey="Najman, Laurent" sort="Najman, Laurent" uniqKey="Najman L" first="Laurent" last="Najman">Laurent Najman</name>
</noCountry>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 008307 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 008307 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Lorraine
   |area=    InforLorV4
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     CRIN:lamiroy02a
   |texte=   Scan-to-XML : Using Software Component Algebra for Intelligent Document Generation
}}

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Jun 10 21:56:28 2019. Site generation: Fri Feb 25 15:29:27 2022